Skip to main content

`Series` Data Structure

A Series in pandas is a one-dimensional array-like object that can hold various data types, similar to a list in Python, but with additional features. It combines elements of lists and dictionaries, storing items in order and allowing access via labels (index).

  • List-like: Ordered collection of items.
  • Dictionary-like: Access items using labels.

Structure

A Series consists of two main components:

  • Index: Similar to keys in a dictionary.
  • Data: Actual values stored in the series.

The data column has a label that can be retrieved using the .name attribute, which is useful for operations like merging multiple columns of data.

Creating a Series

To start, import pandas:

import pandas as pd

From a List

You can create a Series by passing a list of values. Pandas automatically assigns an index starting with zero and sets the name of the series to None.

Example

students = ['Alice', 'Jack', 'Molly']
series_students = pd.Series(students)

Data Types

  • String List: The Series type is set to object.

    students = ['Alice', 'Jack', 'Molly']
    pd.Series(students)
  • Integer List: The Series type is set to int64.

    numbers = [1, 2, 3]
    pd.Series(numbers)

Handling Missing Data

  • Strings with None: Pandas uses the type object.

    students = ['Alice', 'Jack', None]
    pd.Series(students)
  • Numbers with None: Pandas converts None to NaN and sets the type to float64.

    numbers = [1, 2, None]
    pd.Series(numbers)

NaN vs. None

  • NaN is not equivalent to None. Using equality tests, the result is False.

    import numpy as np
    np.nan == None # False
    np.nan == np.nan # False
    np.isnan(np.nan) # True

Creating Series from Dictionaries

A Series can also be created from dictionary data, where the keys become the index values.

Example

students_scores = {'Alice': 'Physics', 'Jack': 'Chemistry', 'Molly': 'English'}
s = pd.Series(students_scores)

Index and Data Types

  • The index can be accessed using the .index attribute.
  • The data type (dtype) of the series and index is inferred automatically.

Example

s.index

Complex Data Types

You can store complex data types like tuples in a Series.

Example

students = [("Alice", "Brown"), ("Jack", "White"), ("Molly", "Green")]
pd.Series(students)

Custom Index

You can explicitly pass an index when creating a Series.

Example

s = pd.Series(['Physics', 'Chemistry', 'English'], index=['Alice', 'Jack', 'Molly'])

Handling Mismatched Index and Dictionary Keys

If the index provided does not match the dictionary keys, pandas will only include the provided index values, filling missing values with None or NaN.

Example

students_scores = {'Alice': 'Physics', 'Jack': 'Chemistry', 'Molly': 'English'}
s = pd.Series(students_scores, index=['Alice', 'Molly', 'Sam'])